246 research outputs found

    Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units

    Get PDF
    This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel corpora are used

    Building machine translation systems for minor languages: challenges and effects

    Get PDF
    La creació de sistemes de traducció automàtica per a llengües desfavorides, que anomenaré llengües menors, presenta diversos reptes alhora que obri la porta a noves oportunitats. Després de definir conceptes preliminars com ara els de llengua menor i traducció automàtica, i d’explicar breument els tipus de traducció automàtica existents, els usos més comuns, el tipus de dades en què es basen, i els drets d’ús i les llicències del programari i de les dades de traducció automàtica, es discuteixen els reptes a què s’enfronta la construcció de sistemes de traducció automàtica i els possibles efectes sobre l’estatus de la llengua menor, usant com a exemples llengües menors d’Europa.Building machine translation systems for disadvantaged languages, which I will call minor languages, poses a number of challenges whilst also opening the door to new opportunities. After defining a few basic concepts, such as minor language and machine translation, the paper provides a brief overview of the types of machine translation available today, their most common uses, the type of data they are based on, and the usage rights and licences of machine translation software and data. Then, it describes the challenges involved in building machine translation systems, as well as the effects these systems can have on the status of minor languages. Finally, this is illustrated by drawing on examples from minor languages in Europe

    Tecnologías de la Traducción: Actividades opcionales

    Get PDF
    Actividades opcionales sobre tecnologías de la traducción

    La traducció automàtica en la pràctica: aplicacions, dificultats i estratègies de desenvolupament

    Get PDF
    En aquest article es descriuen els sistemes de traducció automàtica, les seves aplicacions actuals i les principals dificultats que ha d’afrontar aquesta tecnologia lingüística. Es presenta el sistema Apertium, una plataforma de traducció automàtica de codi obert sobre la qual s’han construït diversos traductors automàtics entre diferents parells d’idiomes, en els quals està inclòs el català. Basant-se en l’experiència dels autors, es descriuen algunes tensions que es donen en el desenvolupament de les dades lingüístiques d’un traductor automàtic i les solucions de compromís a què cal arribar per a construir sistemes útils

    OpenMaTrEx: a free/open-source marker-driven example-based machine translation system

    Get PDF
    We describe OpenMaTrEx, a free/open-source example based machine translation (EBMT) system based on the marker hypothesis, comprising a marker-driven chunker, a collection of chunk aligners, and two engines: one based on a simple proof-of-concept monotone EBMT recombinator and a Moses-based statistical decoder. OpenMaTrEx is a free/open-source release of the basic components of MaTrEx, the Dublin City University machine translation system

    Bilingual dictionary generation and enrichment via graph exploration

    Get PDF
    In recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web. Such linguistic linked data have stimulated the development and use of openly available linguistic knowledge graphs, as is the case with the Apertium RDF, a collection of interconnected bilingual dictionaries represented and accessible through Semantic Web standards. In this work, we explore techniques that exploit the graph nature of bilingual dictionaries to automatically infer new links (translations). We build upon a cycle density based method: partitioning the graph into biconnected components for a speed-up, and simplifying the pipeline through a careful structural analysis that reduces hyperparameter tuning requirements. We also analyse the shortcomings of traditional evaluation metrics used for translation inference and propose to complement them with new ones, both-word precision (BWP) and both-word recall (BWR), aimed at being more informative of algorithmic improvements. Over twenty-seven language pairs, our algorithm produces dictionaries about 70% the size of existing Apertium RDF dictionaries at a high BWP of 85% from scratch within a minute. Human evaluation shows that 78% of the additional translations generated for dictionary enrichment are correct as well. We further describe an interesting use-case: inferring synonyms within a single language, on which our initial human-based evaluation shows an average accuracy of 84%. We release our tool as free/open-source software which can not only be applied to RDF data and Apertium dictionaries, but is also easily usable for other formats and communities.This work was partially funded by the Prêt-à-LLOD project within the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 825182. This article is also based upon work from COST Action CA18209 NexusLinguarum, “European network for Web-centred linguistic data science”, supported by COST (European Cooperation in Science and Technology). It has been also partially supported by the Spanish projects TIN2016-78011-C4-3-R and PID2020-113903RB-I00 (AEI/FEDER, UE), by DGA/FEDER, and by the Agencia Estatal de Investigación of the Spanish Ministry of Economy and Competitiveness and the European Social Fund through the “Ramón y Cajal” program (RYC2019-028112-I)

    Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories

    Get PDF
    This paper explores the use of general-purpose machine translation (MT) in assisting the users of computer-aided translation (CAT) systems based on translation memory (TM) to identify the target words in the translation proposals that need to be changed (either replaced or removed) or kept unedited, a task we term as "word-keeping recommendation". MT is used as a black box to align source and target sub-segments on the fly in the translation units (TUs) suggested to the user. Source-language (SL) and target-language (TL) segments in the matching TUs are segmented into overlapping sub-segments of variable length and machine-translated into the TL and the SL, respectively. The bilingual sub-segments obtained and the matching between the SL segment in the TU and the segment to be translated are employed to build the features that are then used by a binary classifier to determine the target words to be changed and those to be kept unedited. In this approach, MT results are never presented to the translator. Two approaches are presented in this work: one using a word-keeping recommendation system which can be trained on the TM used with the CAT system, and a more basic approach which does not require any training. Experiments are conducted by simulating the translation of texts in several language pairs with corpora belonging to different domains and using three different MT systems. We compare the performance obtained to that of previous works that have used statistical word alignment for word-keeping recommendation, and show that the MT-based approaches presented in this paper are more accurate in most scenarios. In particular, our results confirm that the MT-based approaches are better than the alignment-based approach when using models trained on out-of-domain TMs. Additional experiments were performed to check how dependent the MT-based recommender is on the language pair and MT system used for training. These experiments confirm a high degree of reusability of the recommendation models across various MT systems, but a low level of reusability across language pairs.This work is supported by the Spanish government through projects TIN2009-14009-C02-01 and TIN2012-32615
    corecore